## Word-Sense Disambiguation

In [110]:
from nltk.corpus import wordnet as wn
from nltk import wsd

In [111]:
X = 'The die is cast.'
Y = 'Roll the die to get a 6.'
Z = 'What is dead may never die.'

In [112]:
A= 'Shu-Ren booked a flight to Taiwan.'
B= "Shu-Ren read a book yesterday."
C= 'Shu-Ren planned to write a book chapter'

In [113]:
wn.synsets('die')

[Synset('die.n.01'),
 Synset('die.n.02'),
 Synset('die.n.03'),
 Synset('die.v.01'),
 Synset('die.v.02'),
 Synset('die.v.03'),
 Synset('fail.v.04'),
 Synset('die.v.05'),
 Synset('die.v.06'),
 Synset('die.v.07'),
 Synset('die.v.08'),
 Synset('die.v.09'),
 Synset('die.v.10'),
 Synset('die.v.11')]

In [114]:
wn.synsets("book")

[Synset('book.n.01'),
 Synset('book.n.02'),
 Synset('record.n.05'),
 Synset('script.n.01'),
 Synset('ledger.n.01'),
 Synset('book.n.06'),
 Synset('book.n.07'),
 Synset('koran.n.01'),
 Synset('bible.n.01'),
 Synset('book.n.10'),
 Synset('book.n.11'),
 Synset('book.v.01'),
 Synset('reserve.v.04'),
 Synset('book.v.03'),
 Synset('book.v.04')]

In [115]:
wn.synsets('die', pos=wn.NOUN)

[Synset('die.n.01'), Synset('die.n.02'), Synset('die.n.03')]

In [116]:
wn.synsets ("book", pos=wn.NOUN)

[Synset('book.n.01'),
 Synset('book.n.02'),
 Synset('record.n.05'),
 Synset('script.n.01'),
 Synset('ledger.n.01'),
 Synset('book.n.06'),
 Synset('book.n.07'),
 Synset('koran.n.01'),
 Synset('bible.n.01'),
 Synset('book.n.10'),
 Synset('book.n.11')]

In [117]:
for syn in wn.synsets('die', pos=wn.NOUN):
    print(syn.definition())

a small cube with 1 to 6 spots on the six faces; used in gambling to generate random numbers
a device used for shaping metal
a cutting tool that is fitted into a diestock and used for cutting male (external) screw threads on screws or bolts or pipes or rods


In [118]:
for syn in wn.synsets('book',  pos=wn.VERB):
    print(syn.definition())

engage for a performance
arrange for and reserve (something for someone else) in advance
record a charge in a police register
register in a hotel booker


In [119]:
for syn in wn.synsets('die', pos=wn.VERB):
    print(syn.definition())

pass from physical life and lose all bodily attributes and functions necessary to sustain life
suffer or face the pain of death
be brought to or as if to the point of death by an intense emotion such as embarrassment, amusement, or shame
stop operating or functioning
feel indifferent towards
languish as with love or desire
cut or shape with a die
to be on base at the end of an inning, of a player
lose sparkle or bouquet
disappear or come to an end
suffer spiritual death; be damned (in the religious sense)


## Word-Sense Disambiguation with Lesk Algorithm

In [120]:
print(X)
wsd.lesk(X.split(), 'die')

The die is cast.


Synset('die.v.07')

In [121]:
_.definition()

'cut or shape with a die'

In [122]:
print(A)
wsd.lesk(A.split(), "book")

Shu-Ren booked a flight to Taiwan.


Synset('script.n.01')

In [123]:
_.definition()

'a written version of a play or other dramatic composition; used in preparing for a performance'

In [124]:
wsd.lesk(X.split(), 'die', pos=wn.NOUN).definition()

'a cutting tool that is fitted into a diestock and used for cutting male (external) screw threads on screws or bolts or pipes or rods'

In [125]:
wsd.lesk(A.split(), "book", pos=wn.NOUN).definition()

'a written version of a play or other dramatic composition; used in preparing for a performance'

In [126]:
print(Y)
wsd.lesk(Y.split(), 'die').definition()

Roll the die to get a 6.


'to be on base at the end of an inning, of a player'

In [127]:
wsd.lesk(Y.split(), 'die', pos=wn.NOUN).definition()

'a small cube with 1 to 6 spots on the six faces; used in gambling to generate random numbers'

In [128]:
print(Z)
wsd.lesk(Z.split(), 'die').definition()

What is dead may never die.


'a cutting tool that is fitted into a diestock and used for cutting male (external) screw threads on screws or bolts or pipes or rods'

In [129]:
wsd.lesk(Z.split(), 'die', pos=wn.VERB).definition()

'stop operating or functioning'

## Automatic POS Tagging + Lesk with spaCy

In [130]:
%pip install spacy

Note: you may need to restart the kernel to use updated packages.


In [131]:
from spacy.cli import download
from spacy import load
# download('en_core_web_sm')
nlp = load('en_core_web_sm')

In [132]:
import warnings

POS_MAP = {
    'VERB': wn.VERB,
    'NOUN': wn.NOUN,
    'PROPN': wn.NOUN
    }


def lesk(doc, word):
    found = False
    for token in doc:
        if token.text == word:
            word = token
            found = True
            break
    if not found:
        raise ValueError(f'Word \"{word}\" does not appear in the document: {doc.text}.')
    pos = POS_MAP.get(word.pos_, False)
    if not pos:
        warnings.warn(f'POS tag for {word.text} not found in wordnet. Falling back to default Lesk behaviour.')
    args = [c.text for c in doc], word.text
    kwargs = dict(pos=pos)
    return wsd.lesk(*args, **kwargs)

In [133]:
doc = nlp('Roll the die to get a 6.')

In [134]:
lesk(doc, 'die')

Synset('die.n.01')

In [135]:
lesk(doc, 'die').definition()

'a small cube with 1 to 6 spots on the six faces; used in gambling to generate random numbers'

In [136]:
lesk(nlp('I work at google.'), 'google').definition()

"a widely used search engine that uses text-matching techniques to find web pages that are important and relevant to a user's search"

In [137]:
lesk(nlp('I will google it.'), 'google').definition()

'search the internet (for information) using the Google search engine'

In [138]:
lesk(nlp('My name is Dr. Shu-Ren Chang?'), 'Dr.').definition()

'a person who holds Ph.D. degree (or the equivalent) from an academic institution'

In [139]:
lesk(nlp('My name is Dr. Shu-Ren Chang?'), 'Chang').definition()

'the longest river of Asia; flows eastward from Tibet into the East China Sea near Shanghai'

In [140]:
lesk(nlp('Who is Dr. Shu-Ren Chang?'), 'Who').definition()



'a United Nations agency to coordinate international health activities and to help governments improve health services'

In [142]:
doc = nlp('I have to book a flight')

In [145]:
lesk(doc, 'bok')

ValueError: Word "bok" does not appear in the document: I have to book a flight.

In [146]:
lesk(doc, 'Shu-Ren')

ValueError: Word "Shu-Ren" does not appear in the document: I have to book a flight.

In [147]:
lesk(doc, 'book').definition()

'register in a hotel booker'

In [148]:
doc = nlp('I am reading a book.')

In [149]:
lesk(doc, 'book').definition()

'a written version of a play or other dramatic composition; used in preparing for a performance'

In [159]:
doc1=doc

In [160]:
doc1 = nlp('Please book a flight for me')

In [161]:
lesk(doc1, 'book').definition()

'engage for a performance'