## Lesk Algorithm

#### Implementation of Lesk Algorithm for Word Sense Disambiguation.

Word Sense Disambiguation (WSD) solves the ambiguity which arises due to different meaning of words in different context.

There are different approches to get the most appropriate sense of a word. One of them is the Lesk Algorithm.

Given a word in a context and different senses, we have to find the most appropriate Sense in which the word is used.

In [1]:
import nltk
from nltk.corpus import wordnet
from nltk.corpus import stopwords

In [20]:
sentence = "I went to the bank to deposit money"
word = 'bank'  #the ambiguous word

## Tokenize given sentence
word_tokens = nltk.word_tokenize(sentence)

## Remove stop words from the word_tokens
stop_words = stopwords.words('english')

filtered_words = [word for word in word_tokens if word not in stop_words]
filtered_words

['I', 'went', 'bank', 'deposit', 'money']

## Wordnet is the lexical database for getting lexical information about the given word or phrase.
https://wordnet.princeton.edu/

We can get the following information from WordNet:-
- Synonyms: Words with similar meaning
- Hypernyms: Generic term for a specific class of object.
- Hyponyms: Member of a class term.
- Holonyms: Name of whole for which other are parts.
- Meronyms: Parts of holonyms

In [21]:
# Synset: a set of synonyms that share a common meaning.
# Let's find the synonym for the given word
for sense in wordnet.synsets(word):
    print(sense, sense.definition())

Synset('bank.n.01') sloping land (especially the slope beside a body of water)
Synset('depository_financial_institution.n.01') a financial institution that accepts deposits and channels the money into lending activities
Synset('bank.n.03') a long ridge or pile
Synset('bank.n.04') an arrangement of similar objects in a row or in tiers
Synset('bank.n.05') a supply or stock held in reserve for future use (especially in emergencies)
Synset('bank.n.06') the funds held by a gambling house or the dealer in some gambling games
Synset('bank.n.07') a slope in the turn of a road or track; the outside is higher than the inside in order to reduce the effects of centrifugal force
Synset('savings_bank.n.02') a container (usually with a slot in the top) for keeping money at home
Synset('bank.n.09') a building in which the business of banking transacted
Synset('bank.n.10') a flight maneuver; aircraft tips laterally about its longitudinal axis (especially in turning)
Synset('bank.v.01') tip laterally
Sy

### We can see that word Bank has been used in the above Senses in the database. Now we have to find the most appropriate Sense for the word bank.

We can do so by finding the Max distance between tokens from our example and word tokens from the Definitions we have got from WordNet database.

In [22]:
print('tokenized original sentence -               ', filtered_words)
tokenized_trust = set(nltk.word_tokenize(sense.definition()))
print('tokenized definition of the sense "trust" - ', tokenized_trust)

common_words = tokenized_trust.intersection(filtered_words) 
print(len(common_words))


tokenized original sentence -                ['I', 'went', 'bank', 'deposit', 'money']
tokenized definition of the sense "trust" -  {'faith', 'in', 'or', 'confidence', 'have'}
0


### We can see that from above example that there is nothing common with the given sentence and the Sense we have got from Wordnet. Lets run this for all the Word senses we have got from WordNet and find the Max value Sense.

In [23]:
max_overlap = 0 
best_sense = None 

for sense in wordnet.synsets(word):
    tokenized_sense = set(nltk.word_tokenize(sense.definition()))
    common_words = tokenized_sense.intersection(filtered_words)
    overlap = len(common_words)
    
    if overlap > max_overlap: 
        max_overlap = overlap
        best_sense = sense

In [24]:
#So, the best sense of the word 'bank' is 
print(best_sense.definition())
print(max_overlap)

a financial institution that accepts deposits and channels the money into lending activities
1


## We can also use the NLTK implementation of Lesk for the same.

In [25]:
from nltk.wsd import lesk

In [27]:
## We can see different senses for the word bank from wordnet
for sense in wordnet.synsets('bank'):
    print(sense, sense.definition())

Synset('bank.n.01') sloping land (especially the slope beside a body of water)
Synset('depository_financial_institution.n.01') a financial institution that accepts deposits and channels the money into lending activities
Synset('bank.n.03') a long ridge or pile
Synset('bank.n.04') an arrangement of similar objects in a row or in tiers
Synset('bank.n.05') a supply or stock held in reserve for future use (especially in emergencies)
Synset('bank.n.06') the funds held by a gambling house or the dealer in some gambling games
Synset('bank.n.07') a slope in the turn of a road or track; the outside is higher than the inside in order to reduce the effects of centrifugal force
Synset('savings_bank.n.02') a container (usually with a slot in the top) for keeping money at home
Synset('bank.n.09') a building in which the business of banking transacted
Synset('bank.n.10') a flight maneuver; aircraft tips laterally about its longitudinal axis (especially in turning)
Synset('bank.v.01') tip laterally
Sy

In [28]:
# Lets use Lesk for find the most appropriate Sense
lesk(sentence.split(), "bank").definition()

'a container (usually with a slot in the top) for keeping money at home'