# **Word Sense Disambiguation**

Words can have different meanings in different contexts. Sometimes the intended
meaning of a word is hard to understand and leads to miscommunication. If a word has multiple meanings, this is called word
sense ambiguity. While solving syntactic ambiguity is done with part-of-speech (POS)
tagging, solving semantic ambiguity is done with word sense disambiguation (WSD).
The challenge is to semantically separate words by their meaning in context [[1]](#scrollTo=fPge5oRLQwid).

This notebook shows some basic WSD examples with ``pywsd`` library.

## **``pywsd``**
``pywsd`` is a Python library that provides WSD functions as well as several variations of the Lesk algorithm [[1]](#scrollTo=fPge5oRLQwid).

For more detail about ``pywsd``, please refer to [[4]](https://pypi.org/project/pywsd/).

### Import libraries

#### Install Pywsd

In [2]:
pip install pywsd  

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


#### Install ``wn``
``wn`` is a new Python library for working with wordnets. Unlike previous libraries, ``wn`` is built from the beginning to accommodate multiple wordnets (for multiple languages or multiple versions of the same wordnet) while retaining the ability to query and traverse them independently. For more detail about the ``wn`` library, please refer to [[5]](https://pypi.org/project/wn/) and [[6]](https://aclanthology.org/2021.gwc-1.12/).



In [3]:
pip install wn==0.0.22

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


#### Import ``nltk`` and ``wordnet``

``nltk``(Natural Language Toolkit) is an open source Python library for natural language processing. For more detail about ``nltk``, please refer to [[2]](https://www.nltk.org/api/nltk.html#nltk.wsd.lesk).

``wordnet`` is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations [[3]](http://www.nltk.org/howto/wsd.html). 


In [5]:
# Import nltk module
import nltk

# Download "wordnet" package by using the nltk module
nltk.download('wordnet')

# The module 'averaged_perceptron_tagger' is used for POS tagging
nltk.download('averaged_perceptron_tagger')

# The module "punkt" is used for tokenizing sentences 
nltk.download('punkt')

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Unzipping corpora/wordnet.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

#### Import ``simple_lesk``

``lesk`` algorithm is an example of a knowledge-based method and is based on contextual overlap of dictionary definitions. The approach is based on the assumption that words used together are also related to each other [[1]](#scrollTo=fPge5oRLQwid).

In [6]:
# simple_lesk returns the sense most suited to the given word as per the Simple LESK Algorithm
from pywsd.lesk import simple_lesk  

### WSD appication examples

#### Bank

In [7]:
# Create a sample text which contains two sentences
text1 = ['I went to the bank to deposit my money', 'The river bank was full of dead fishes']

# Anaylze the first sentence and print the definition of the word "bank"
print( "=============== analyse sentence 1 =================\n")
print ("Context-1:", text1[0])  
answer1 = simple_lesk(text1[0],'bank')  
print ("Sense:", answer1)  
print ("Definition : ", answer1.definition())  

# Anaylze the second sentence and print the definition of the word "bank"
print( "\n\n=============== analyse sentence 2 =================\n")
print ("Context-2:", text1[1])  
answer2 = simple_lesk(text1[1],'bank')  
print ("Sense:", answer2)  
print ("Definition : ", answer2.definition())  

#for s in wn.synsets('fair'):
#    print('\t', s, s.definition())


Context-1: I went to the bank to deposit my money
Sense: Synset('depository_financial_institution.n.01')
Definition :  a financial institution that accepts deposits and channels the money into lending activities



Context-2: The river bank was full of dead fishes
Sense: Synset('bank.n.01')
Definition :  sloping land (especially the slope beside a body of water)


#### Plant

In [44]:
# Create a sample text which contains two sentences
text2 = ['The workers at the industrial plant were overworked.', 'The plant was no longer bearing flowers.']

# Anaylze the first sentence and print the definition of the word "plant"
print( "=============== analyse sentence 1 =================\n")
print ("Context-1:", text2[0])  
answer1 = simple_lesk(text2[0],'plant')  
print ("Sense:", answer1)  
print ("Definition : ", answer1.definition())  

# Anaylze the second sentence and print the definition of the word "plant"
print( "\n\n=============== analyse sentence 2 =================\n")
print ("Context-2:", text2[1])  
answer2 = simple_lesk(text2[1],'plant')  
print ("Sense:", answer2)  
print ("Definition : ", answer2.definition())  


Context-1: The workers at the industrial plant were overworked.
Sense: Synset('plant.n.01')
Definition :  buildings for carrying on industrial labor



Context-2: The plant was no longer bearing flowers.
Sense: Synset('plant.v.01')
Definition :  put or set (seeds, seedlings, or plants) into the ground


#### Fair

In [43]:
# Create a sample text which contains two sentences
text3 = ['Everyone needs to be given a fair chance in the competition.', 'The annual fair in our city is next weekend.']

# Anaylze the first sentence and print the definition of the word "fair"
print( "=============== analyse sentence 1 =================\n")
print ("Context-1:", text3[0])  
answer1 = simple_lesk(text3[0],'fair')  
print ("Sense:", answer1)  
print ("Definition : ", answer1.definition())  

# Anaylze the second sentence and print the definition of the word "fair"
print( "\n\n=============== analyse sentence 2 =================\n")
print ("Context-2:", text3[1])  
answer2 = simple_lesk(text3[1],'fair', 'n')  
print ("Sense:", answer2)  
print ("Definition : ", answer2.definition())  


Context-1: Everyone needs to be given a fair chance in the competition.
Sense: Synset('honest.s.07')
Definition :  gained or earned without cheating or stealing



Context-2: The annual fair in our city is next weekend.
Sense: Synset('fair.n.03')
Definition :  a competitive exhibition of farm products


# **References**

- [1] NLP and Computer Vision_DLMAINLPCV01 Lecture Book
- [2] https://www.nltk.org/api/nltk.html#nltk.wsd.lesk
- [3] http://www.nltk.org/howto/wsd.html
- [4] https://pypi.org/project/pywsd/
- [5] https://pypi.org/project/wn/
- [6] https://aclanthology.org/2021.gwc-1.12/


Copyright © 2022 IU International University of Applied Sciences